Prospective epigenome and transcriptome analyses of cord and peripheral blood from preterm infants at risk of bronchopulmonary dysplasia

Bronchopulmonary dysplasia (BPD) is a prevalent chronic lung disease of prematurity with limited treatment options. To uncover biomarkers of BPD risk, this study investigated epigenetic and transcriptomic signatures of prematurity at birth and during the neonatal period at day 14 and 28. Peripheral blood DNAs from preterm infants were applied to methylation arrays and cell-type composition was estimated by deconvolution. Covariate-adjusted robust linear regression elucidated BPD- and prolonged oxygen (≥ 14 days) exposure-associated CpGs. RNAs from cord and peripheral blood were sequenced, and differentially expressed genes (DEGs) for BPD or oxygen exposure were determined. Estimated neutrophil–lymphocyte ratios in peripheral blood at day 14 in BPD infants were significantly higher than nonBPD infants, suggesting an heightened inflammatory response in developing BPD. BPD-DEGs in cord blood indicated lymphopoiesis inhibition, altered Th1/Th2 responses, DNA damage, and organ degeneration. On day 14, BPD-associated CpGs were highly enriched in neutrophil activation, infection, and CD4 + T cell quantity, and BPD-DEGs were involved in DNA damage, cellular senescence, T cell homeostasis, and hyper-cytokinesis. On day 28, BPD-associated CpGs along with BPD-DEGs were enriched for phagocytosis, neurological disorder, and nucleotide metabolism. Oxygen supplementation markedly downregulated mitochondrial biogenesis genes and altered CpGs annotated to developmental genes. Prematurity-altered DNA methylation could cause abnormal lymphopoiesis, cellular assembly and cell cycle progression to increase BPD risk. Similar pathways between epigenome and transcriptome networks suggest coordination of the two in dysregulating leukopoiesis, adaptive immunity, and innate immunity. The results provide molecular insights into biomarkers for early detection and prevention of BPD.

Tumor necrosis factor superfamily member 8 Bronchopulmonary dysplasia (BPD) is a prominent chronic respiratory disease among survivors of extremely preterm (< 28 weeks of gestational age, GA) and very low birth weight (BW, < 1500 g) infants 1,2 . The modern consensus on BPD pathogenesis is largely attributed to the arrested lung development due to premature birth which interrupts the intra-uterine programmed lung development and the subsequent repair of lung injury caused by the supplemental oxygen (O 2 ) and mechanical ventilation in the neonatal intensive care unit (NICU) 2 . Lack of definitive cure and the persistent lung impairment can lead to long-term pulmonary dysfunction and increased risk for adverse respiratory symptoms (e.g., airway diseases, exercise intolerance) in BPD survivors [3][4][5][6] . There are limited genetic, epigenetic, or transcriptomic resources for the investigation of BPD etiology and pathogenesis in preterm infants. In previous studies, genome-wide association studies (GWASs) and exome sequencing in various populations with BPD elucidated potential genetic risk factors such as SPOCK2 encoding SPARC (osteonectin), Cwcv and Kazal like domains proteoglycan 2 and CRP encoding C-reactive protein [7][8][9][10][11][12] . Transcriptomic studies by database search, microarray, and RNA sequencing (RNA-Seq) have reported the differentially expressed genes (DEGs) in blood, lung cell or lung biopsy from BPD cases [13][14][15][16][17] . Most recently single cell RNA-Seq determined monocyte (tracheal aspirate)-or CD8 + T cell-specific BPD-associated genes 18,19 . Epigenome-wide association studies (EWASs) have identified DNA methylation loci associated with BPD in formalin-fixed lung autopsy/biopsy 15 or with neonatal morbidities including BPD 20 . Recently, we reported epigenetic markers of BPD risk in cord blood DNAs from a preterm cohort 21 , and they included CpGs annotated with genes including cathepsin H (CTSH, cg24847366) and SPOCK2 (cg17958658) 22 . The results indicated that cord blood methylome changes involved in lung and tissue development, cell cycle and leukopoiesis, immunemediated inflammation, T and B cell responses, and platelet activation may precede or indicate risk of BPD development 22 . In that report, we also demonstrated that methylation-based cord blood cell-type composition varied by BW or GA, and that a higher nucleated red blood cell (NRBC) proportion was significantly associated with a lower BW and DNA hypomethylation profile 22 .
In the current study, we expanded cell-type deconvolution of DNA methylation to postnatal peripheral blood from the same preterm cohort 21,22 . Methylome analysis during the newborn period determined potential postnatal epigenome markers of BPD development and RNA-Seq demonstrated BPD-associated DEGs in cord and postnatal peripheral blood cells from the premature infants. In addition, influence of prolonged NICU O 2 supplementation on blood cell DNA methylation and gene expression landscape was elucidated in nonBPD Results Demographics of cohort. A total of 109 premature infants were included in the study after applying exclusion criteria. Table 1 and Additional file 1: Table S1 present the neonatal and maternal characteristics and fetal complications of the prematurely born neonates diagnosed with BPD or without (nonBPD). Diagnosis of BPD was made for 15 preterm infants based on the National Institutes of Health consensus definition and O 2 needs at either 36 weeks postmenstrual age or at 28-56 days of postnatal life based on GA 23 . BPD development was sexually dimorphic with 25% of males (10/40) and 7.2% of females (5/69) diagnosed among these premature infants. Infants diagnosed with BPD were born at significantly lower mean BW (p = 0.002) and GA (p < 0.001) and were supplemented with NICU O 2 for a greater number of days (46.5 ± 4.0, p < 0.001) compared to nonBPD infants (7.8 ± 1.1). Infections were more common in the neonates who developed BPD (Table 1, p = 0.006). The time points (14 days and 28 days of life) used for peripheral blood analysis were chosen preceding symptoms and diagnosis of BPD. www.nature.com/scientificreports/ Methylation-based estimation of blood cell composition. The methylation-based deconvolution model estimated seven cell type percentages in cord and peripheral blood. Of these, six cell types (with the exception of natural killer (NK) cells) were determined to be significantly varied between BPD and nonBPD neonates at one or more time points (Fig. 1A). The percentage of lymphocytes including CD8 + T cells (day 14),  CD4 + T cells (cord blood, days 14 and 28), and B cells (days 14 and 28) were significantly decreased in BPD relative to nonBPD (Fig. 1A). In contrast, percentage of myeloid cells including monocytes (day 28) and granulocytes (days 14 and 28) and NRBCs (days 14 and 28) were significantly higher in BPD than in nonBPD (Fig. 1A). The previously reported cord blood cell-type compositions 22 based on the Houseman deconvolution algorithm 24 were re-evaluated together with postnatal day data using the approach of Koestler et al. (Identifying Optimal  DNA methylation Libraries, IDOL) 25 .
The estimated seven cell-type distribution plotted relative to GA quintiles among all premature infants indicated time-and GA-dependent shifts of cell-type composition during the neonatal period (Fig. 1B). Compared to cord blood, NRBC proportions in peripheral blood from days 14 and 28 were markedly decreased in all GA quintile groups (Fig. 1B). Higher CD4 + T cell percentage in higher GA groups were also evident at all times and a similar trend was shown for percentage of CD8 + T cells on day 14 and percentage of B cells in cord blood and on day 28 (Fig. 1B). In contrast granulocyte proportions, assumed to be largely neutrophils, were higher in lower GA groups on days 14 and 28 (Fig. 1B).
There were significant inverse correlations between GA and granulocyte percentage on both postnatal days in preterm infants (day 14, r 2 = 0.0837, p = 0.0049; day 28, r 2 = 0.1173, p = 0.0016) ( Fig. 2A). GA was also significantly, but positively, associated with the percentage of the sum of all lymphoid subpopulations (CD4 + T, CD8 + T, B cell, NK) on both postnatal days (day 14 r 2 = 0.2193, p = 1.0E−04; day 28, r 2 = 0.1566, p = 2.0E−04) ( Fig. 2A). An increased percentage of granulocytes and a suppressed percentage of lymphocytes was observed in BPD infants relative to nonBPD infants ( Fig. 2A). The neutrophil (granulocyte)-lymphocyte ratio (NLR), reported to be an early predictor of BPD 26,27 , was significantly higher in BPD compared to nonBPD on all neonatal days (Fig. 2B). Examining NLR, we found significant correlations with NICU O 2 days and GA (Fig. 2C). NLRs were lowered by    www.nature.com/scientificreports/ day 28 (Fig. 2B) while the significant positive correlation to NICU O 2 days persisted (Additional file 2: Fig. S1). Overall we observed that higher NLR was correlated with lower GA, longer O 2 supplementation, and associated with development of BPD.
To further explore differences in cell-type proportions, particularly shifts among lymphocyte subtypes from naïve to memory, we utilized the 12 cell-type model of Salas et al. 28 . While this model does not include NRBCs and cannot distinguish NRBCs from myeloid lineage cell types, it estimates naïve and memory composition of B cells, CD4 T cells and CD8 T cells. BPD infants showed significant increases in memory B cells at both day 14 and day 28 (p < 0.01, Additional file 2: Table S2), significant decreases in CD4 and CD8 naïve cell types at day 14 (p < 0.05). This model also indicated a highly significant increase in NLR for BPD infants at both day 14 and day 28 (p < 0.01).
Cord blood genes differentially expressed in BPD and comparison with EWAS. RNA-seq was carried out on cord blood RNA, and DESeq2 analysis adjustment for nine covariates (GA, BW, sex, and six cell-type percentages of CD4 + T, CD8 + T, B cell, granulocyte, monocyte, and NRBC) determined DEGs in BPD infants (471 genes at p < 0.01; 1685 genes at p < 0.05) ( Table 2; Additional file 1: Table S3; Additional file 2: Fig. S2A). Pathway analysis indicated that 471 DEGs were predominantly involved in the inhibition of lymphopoiesis, T cell receptor signaling and Th1/Th2 pathways, and adaptive immune response (e.g., CD27, CD79B, ICOS, IL18R1, LCK, STAT4) ( Fig. 3A; Table 2; Additional file 1: Table S4), consistent with the decreased T and B cell quantity in BPD infants as depicted in Fig. 1A. The observation of the decreased pyroptosis pathway (e.g., CASP9, GZMA), activated antiviral response and interferon (IFN) signaling (e.g., IFIM3, SOCS1, TLR4) and mitotic cell cycle (e.g., BORA, RAD51, SPC25) ( Fig. 3A; Additional file 1: Table S4) coordinately indicated inflammatory cell proliferation against infection, agreeing with the increased NLR found in BPD infants (shown in Fig. 2B). These and other pathways including development and respiratory disorders showed similar enrichment in the cord blood BPD-EWAS CpGs 22 . Parallel analysis of epigenome and transcriptome pathways in cord blood cells suggests an epigenetic modulation of the T cell autocrine lymphokine IL-2 29 , which could be an upstream event to manage transcriptional changes for T cell clonal expansion and growth, leukocyte activation, and IL-18 stimulation for IFN production and NK cell cytotoxicity (Fig. 3B). The 134 genes in common between BPD-CpGs and BPD-DEGs (e.g., CCL5, HLA-DQA2, IL18RAP, LCK, SELP) may play key roles in granulocyte inflammation and innate and adaptive immune responses (Additional file 1: Table S3; Additional file 2: Fig. S2B). Lung alveolarization-and BPD-associated SPOCK2 7,30 and AGER encoding receptor for advanced glycosylation end-product (RAGE) 31,32 were also elucidated in both analyses. Many of the overlapping genes showed reciprocal changes in methylation and gene expression differences indicating epigenetic regulation of their transcription.

BPD-associated CpGs and DEGs in peripheral blood cells on postnatal day 28. BPD diagnosis
was associated with 116 CpGs at BF (2299 CpGs at FDR 1% cut-off) on day 28 ( Fig. 5A; Table 3; Additional file 1: Table S8) and DNA hypomethylation (demethylation of ~ 80% of CpGs) was evident in infants who developed BPD, similarly to the result on day 14 (Additional file 2: Fig. S4A). Pathways of the BPD-CpG-annotated genes were similar to those seen on day 14 and tissue morphology, neuron and body development, and cellular assembly (e.g., CACNB2, CHD7, FGF8, HOXB6, PAX2) were noted ( Fig. 5B; Additional file 1: Table S9). In addition,  Table S10; Additional file 2: Fig. S4B). Enriched pathways of DEGs on day 28 were similar to those on day 14 such as chromosome segregation and DNA damage (e.g., BLM, CHEK1), activation of phagocytosis and degranulation (e.g., GNLY, MPO, LTF), leukocyte activation and immune response (e.g., IL23A, PIK3R3, RAG1, TRDC), neurological disorder (e.g., ATP6V1A, DUSP), and lipid metabolism (e.g., ACACA , SCD) ( Fig. 5B; Additional file 1: Table S9). The epigenome-transcriptome common 30 genes on day 28 were also enriched in these pathways (Additional file 2: Fig. S4C), indicating the transcriptome changes are parallel with epigenome changes on day 28 in infants who are developing BPD. Overall, this indicated that altered leukocyte activation and antimicrobial response, DNA damage and nucleotide metabolism, lipid metabolism, and angiogenesis and other growth-related events continue on day 28 in premature infants who went on to develop BPD.
Comparison of epigenomic and transcriptomic pathways influenced by O 2 supplementation showed similarities in neuron and tissue development signaling pathways (e.g., TGF-β, NGF, ID-1), cellular assembly, xenobiotic metabolism, cellular senescence, reactive oxygen species production, cell and organ death, and inflammation and infection (Fig. 6C). The pathway activation z-scores of the methylation changes were, in general, inversely related to those of gene expression changes in these functions.

Discussion
This study analyzed DNA methylation and gene expression profiles of cord and venous blood during the 1st month after premature birth and reported on numerous molecular and cellular differences observed in neonates later diagnosed with BPD (summary in Fig. 7). We observed that GA was significantly correlated with decreased proportions of lymphocytes and increased proportions of granulocytes in both cord and peripheral blood. These changes were also reflected in the NLR, particularly in the extremely preterm neonates, including the infants who ultimately developed BPD. NLR is easily assessed by standard blood count methods and the NLR in the first weeks of life could indicate babies at high risk for later developing BPD as has been suggested 27 . The present findings are consistent with this notion as we observed greater NLR in BPD babies at each of the time points studied.
The NLR is likely to be driven by inflammation, and both systemic and lung inflammation by antenatal infection, chorioamnionitis (e.g., rubella, Ureaplasma species), and postnatal sepsis (e.g., E. coli, group B streptococcus) have been reported to increase BPD risk [33][34][35] . The NLR and the pathway enrichment in: interferon signaling (cord blood and day 14 methylome and transcriptome); neutrophil degranulation (day 14 methylome, day 28 transcriptome), hyper-cytokinemia (day 14 transcriptome); complement system activation (day 14 methylome); and phagocytosis (day 28 methylome and transcriptome), collectively suggested microbial infection, neutrophilic inflammation, and altered innate immune responses in infants who developed BPD. Key genes in the innate immune response that were differentially expressed or methylated included GNLY and granzyme (GZMA), the antimicrobial peptides from killer lymphocyte granules causing microptosis 36 . In addition, antimicrobial lactoferrin (LTF), neutrophilic myeloperoxidase (MPO), anti-streptococcus AGER and cathepsin D (CTSD), interferon alpha 1 (IFNA1), interferon induced transmembrane proteins (IFITM1, IFITM3), and the negative regulator of interferon response (NPIR) are likely to be important players in innate immune modulation of BPD pathogenesis. This is consistent with our observation of a higher frequency of infection in the BPD infants (Table 1), including sepsis and pneumonia (Additional file 1: Table S1), and also with an adaptive immune response as indicated by apparent shifts from naïve to memory among B cells, CD4 T cells and CD8 T cells (Additional file 1: Table S2).
Consistent with the lowered percentages of lymphocytes (CD4 + T, CD8 + T, B cells) in BPD compared to nonBPD infants, EWAS and RNA-seq coordinately suggested suppressed T cell immunity in BPD infants (Fig. 7). Mucosal immune imbalance due to lymphocyte insufficiency may mediate respiratory infection and inflammation, and play a role in lung damage and aberrant development. O 2 -induced DNA damage and cell cycle arrest during the neonatal period (Figs. 4C; 5B) may also contribute to decreased lymphopoiesis in BPD. Immune cell senescence due to oxidative stress and telomere shortening was also suggested by telomere-related epigenetic (e.g., TERT, TNKS) or transcriptomic (e.g., PINX1, TERF2) changes in BPD cases on day 14. In cord blood, epigenetic inhibition of IL-2 may affect downstream transcriptomics leading to improper lymphocyte differentiation and activation in BPD (Fig. 3B). IL-2 from CD4 + T and CD8 + T cells is a powerful modulator of T cell growth, leukocyte activation, and IL-18 stimulation of IFN production and NK cell cytotoxicity 29 . Downregulation of immune surface marker genes related to T cell maintenance and T and B cell activation (e.g., CD28, CD27, CD7, CD79B, CD83) in cord and/or peripheral blood further supports skewed B and T cell immunity in preterm infants who developed BPD. In addition, decreased expression of tumor necrosis factor superfamily member 8 (TNFSF8) encoding CD153, one of the 7 BPD-DEGs in common to all timepoints, suggested interruption of T cell-dependent anti-mycobacterial immune response 37 and also class switch DNA recombination and immunoglobulin production 38 in BPD pathogenesis. T cell inadequacy has been suggested as an immunological clue for BPD prediction. For example, a downregulation of T cell receptor signaling was reported in postnatal blood (5-28 days) from preterm BPD babies 14,19 . Scheible et al. 39 reported that premature infants had a lowered level of CD31 + CD4 + T cells on discharge and showed increased risk for respiratory complications at 1 year of age compared to full-term babies 37,38 .
In addition to the immunological biomarkers, we also observed development-and alveolarization-related epigenetic and transcriptomic markers in cord blood of infants who developed BPD 30,31,40 . Concurrently with BPD epigenome data 22  www.nature.com/scientificreports/ susceptibility gene identified from GWAS 7,30,31 , was reported to be deleterious in BPD development as its overexpression in the lung altered neonatal mouse lung alveolarization and exacerbated hyperoxia-caused alveolar simplification while anti-SPOCK2 antibody treatment in mice improved it 7,30 . AGER is a type 1 alveolar cell differentiation marker 31 and prenatal exposure to nicotine in mice arrested alveolarization and upregulated the AGER signaling pathway 41 . In addition, Ager knock-in mice were generated to study alveolar development and type 1 cell turnover during injury and repair 42 .
A unique aspect of this study was the opportunity to examine the in vivo response of blood cells to O 2 therapy among nonBPD infants. Comparing preterm infants who received at least 14 days of NICU O 2 supplementation with those who never received O 2 therapy, we observed a dramatic downregulation of a large set of mitochondriarelated genes, either mitochondrial DNA-encoded (e.g., MT-ND5, MT-CYB, MT-CO2, MT-ATP6) or nuclear DNA-encoded (e.g., CPT1A, LRPPRC, SLC25A24, UCP2). Relative to those who never received O 2 therapy, this down-regulation also occurred in BPD neonates. These genes are involved in mitochondrial DNA-related diseases such as mitochondrial leukoencephalopathy, mitochondrial myopathy, ventricular preexcitation, optic and retinal disorders, and type I diabetes (Additional file 2: Fig. S5B). Other downregulated genes (e.g., BAX, HTT, IGF1R, OGDH, TGFB1) have also been associated with mitochondrial morphology and permeability. Consistent with our observation, Das et al. 43 reported that hyperoxia exposure reduced mitochondrial oxidative phosphorylation complex (I and II) activity in lung epithelial cells. In addition, inhibition of mitochondrial respiration caused arrested alveolarization of neonatal mice 44 , indicating mitochondrial biogenesis is critical in lung maturation. In addition, it has been reported that vascular endothelial mitochondrial function is associated with BPD 45 . We observed mitochondrial gene dysregulation and predicted functional abnormality in blood cells at day 14 of O 2 therapy in both BPD and nonBPD neonates relative to neonates with no O 2 therapy. While down-regulated mitochondrial biogenesis was observed among neonates with O 2 therapy and may be caused by this treatment, it is possible the mitochondrial effects are reflecting the infant's poor blood O 2 saturation in the neonatal period, rather than a response to treatment. Overall, mitochondrial dysfunction may be one of the underlying mechanisms driving BPD in susceptible infants. However, while nonBPD neonates are able to respond and recover from this exposure, the neonates who develop BPD continue to need O 2 therapy for many weeks to maintain optimal blood O 2 saturation levels. It suggests that additional susceptibility factors such as genetic influences or prenatal conditions that lead to a deficit in the response to O 2 therapy are likely to be involved 45 .
Optimal O 2 supplementation in the NICU is essential to reduce the risk of respiratory mortality and other morbidities. However, in addition to BPD, preterm babies who require prolonged O 2 treatment often develop neurological disorders, such as cognitive deficits in the absence of apparent brain injury, and are at higher risk of morbidities including retinopathy of prematurity (ROP) and cardiac disorders 46,47 . Mitochondrial dysfunction due to oxidative stress is associated with adult neurodegenerative disorders 48 and cardiovascular toxicity 49 , and has been proposed as a mechanism in preterm morbidities 50

Conclusions
The small size of the examined cohort and the low incidence of BPD in this study limited the statistical power for many comparisons and therefore, any conclusions should be interpreted with caution. However, despite these limitations, epigenomic and transcriptomic profiling of blood from preterm infants receiving O 2 supplementation revealed time-dependent immunopathological events in the early weeks of life before they were diagnosed with BPD. Altered methylation and dysregulated transcription presumably have affected neutrophil and lymphocyte proportions, T cell and adaptive immune responses, inflammation and phagocytosis, cellular assembly, and repair of DNA damage in premature neonates. Prolonged O 2 treatment highly suppressed mitochondrial gene expression, which may be driving lung pathology and be implicated in a variety of neuronal and optical disorders of prematurity. These molecular and cellular outcomes such as NLR may be predictors of BPD susceptibility and should be followed up in larger studies of BPD. Similarity of pathways and overlapping genes between two genomic networks suggested interplay between epigenetics and transcriptomics in BPD pathogenesis. These results provided insights into mechanisms of BPD and warrant further investigation into the clinical relevance of blood methylomic and transcriptomic markers of BPD risk.   www.nature.com/scientificreports/ Table 4. Representative differentially expressed genes (DEGs) in peripheral blood cells of preterm infants with bronchopulmonary dysplasia (BPD). A total of 731 DEGs (p < 0.01or 312 DEGs (p < 0.05), 1 DEG at FDR 10%) significantly varied between BPD and nonBPD-exposed to any day(s) of oxygen (O 2 ) on postnatal days 14 or 28, respectively, determined by RNA-sequencing analysis. FC expression difference in BPD vs. nonBPD-AnyO 2 . Full lists of the DEGs in Additional file: Tables S6 and S10. *Genes overlapped with those annotated to epigenome-wide association study of the corresponding time.

Methods
Study cohort. The Discovery-Bronchopulmonary Dysplasia Program (D-BPD) cohort was described in previous publications 21,22 and much of the methods overlap with the present work. In brief, in the parent study, 378 preterm infants < 1500 g of birth weight in Buenos Aires, Argentina, were recruited within 13 days of life and followed prospectively in the NICU until discharged or 44 weeks of corrected GA 21,22 . A diagnosis of BPD was made for infants who received at least 28 days of O 2 (> 21%) supplementation therapy and need for O 2 (≥ 30%) and/or positive pressure (1) at 36 weeks of PMA or at discharge (whichever comes first) if born < 32 weeks GA or (2) at 28-56 days postnatal age or at discharge (whichever comes first) if born ≥ 32 weeks GA, as defined in Jobe and Bancalari 23 . A total of 109 patients (15 BPD, 94 nonBPD) satisfying all study inclusion criteria provided a cord and/or peripheral blood sample for methylation arrays and/or RNA-Seq analysis (Table 1; Additional file 1: Table S1). The study methods were performed in accordance with the relevant guidelines and regulations and the design was approved by each of the local Institutional Review Boards (IRB) in Buenos Aires and at the NIEHS (08-E-N159). Parents provided written informed consent as described elsewhere 21,22 .    Overlapping genes annotated to differentially methylated CpGs determined by EWAS or differentially expressed genes determined by RNA-sequencing (RNA-Seq) between different times of samples. Common 103 genes from extended list (false discovery rate 1%) from EWAS were mainly associated with organ morphology and neonatal death, inflammation, and molecular transport. RNA-Seq elucidated seven overlapping genes (at p < 0.05) increased (↑) or decreased (↓) in BPD compared to nonBPD. (B) BPD-associated pathways and gene ontologies predicted from epigenome and transcriptome changes were compared by time. www.nature.com/scientificreports/ Genomic DNA and total RNA extraction. Cord or peripheral blood samples were collected at birth and at day 14 and day 28 of life, placed in PAXgene reagent (Qiagen Inc., Valencia, CA) and then snap frozen at -80 °C. The PAXgene Blood miRNA Kit (PreAnalytix/Qiagen) was used following the manufacturer's procedure. As in Wang et al. 22 blood specimens were incubated at room temperature for > 2 h to lyse RBCs and centrifuged (3500 g, 15 min) to acquire cell pellets. Pellets were washed and treated with proteinase K at 55 °C (800 rpm, 15 min), and isopropanol was added to the soluble fractions of the supernatants prepared from the QiaShredder spin columns. For RNA isolation, the isopropanol precipitants were added into the PAXgene RNA spin columns and processed for DNase treatment followed by RNA extraction procedures as indicated in the manufacturer's brochure. As in Wang et al. 22 , for DNA isolation, the isopropanol precipitants prepared with the PAXgene miRNA Kit were loaded into the DNeasy Mini spin columns (DNeasy Blood & Tissue Kit, Qiagen) and followed the manufacturer's procedure. DNAs and RNAs were quantified using Qubit (Thermo Fisher, Waltham, MA) and stored at -80 °C until used.
DNA methylation microarray analysis. Aliquots (100-500 ng) of genomic DNA from whole peripheral blood were bisulfite-converted using EZ-96 DNA Methylation MagPrep kits (Zymo Research, Irvine, CA) following the manufacturer's instructions as described previously 22 . Bisulfite-converted DNAs were applied to Human MethylationEPIC BeadChip (Illumina, San Diego, CA), which covers over 850,000 CpG sites in the human genome, at the NIEHS Molecular Genomics Core Laboratory. The raw IDAT files from the EPIC methylation arrays were read into R with the minfi package 51 , and the data was preprocessed with background correction and dye-bias normalization using the preprocessNoob method 52 as in 22 . The champ.runCombat function in ChAMP package 53 was used for batch correction ("Sentrix_ID" and "Sentrix_Position"). DNA methylation data were filtered prior to normalization, using the following exclusion criteria: arrays with > 5% failed probes; CpG probes located on X and Y chromosomes; and any probes containing one or more single-nucleotide polymorphisms having a minor allele frequency ≥ 1% (in EUR population of the 1000 Genomes Project) occurring within 5 nucleotides relative to the CpG site. Probes reported to hybridize to one or more non-target sites in the genome were deleted 54 . There were 775,201 CpG probes remaining after exclusions. DNA methylation array data are deposited in Gene Expression Omnibus (GEO, accession number: GSE225313). To detect potential outlier samples in the methylation dataset we prepared principal component analysis (PCA) plots of methylation data from Day 14, Day 28 and combined (Additional file 2: Fig S6). No methylation samples were excluded.
Methylation-based cord blood and peripheral blood cell type estimation. The percentages of seven blood cell types (CD4 + T, CD8 + T, NK cells, B cells, monocytes, granulocytes, and nucleated red blood cells) were estimated using reference DNA methylation profiles (https:// github. com/ immun ometh ylomi cs/ FlowS orted. CordB loodC ombin ed. 450k) for cord blood and the IDOL deconvolution algorithm 25 . The 12 celltype model of Salas et al. 28 was used to differentiate naïve from memory B cells, CD4 T cells and CD8 T cells.
Differentially methylated loci associated with BPD or NICU O 2 exposure. Identification of differentially methylated probes was accomplished using a robust linear regression analysis of M values (log ratio of beta values) on disease status (BPD vs nonBPD-exposed to ≥ 1 day O 2 in NICU) on methylation values from postnatal day 14 and day 28 with adjustment for infant sex, GA, BW, and seven estimated blood cell type proportions (CD4 + T, CD8 + T, NK, B, monocyte, granulocyte, NRBC). Differentially methylated loci associated with O 2 exposure were identified on day 14 among nonBPD neonates (nonBPD-no O 2 vs nonBPD-≥ 14 days O 2 in NICU) with adjustment for the 10 covariates indicated above. The Winsorize technique (https:// www. rdocu menta tion. org/ packa ges/ DescT ools/ versi ons/0. 99. 44/ topics/ Winso rize) was used in an alternative EWAS analysis to test if outlier data points affected the resulting differentially methylated CpGs. We did 90% winsorization, that is the top 5% of the data is replaced by the value of the data at the 95th percentile and the value of the bottom 5% of the data is replaced by the value of the data at the 5th percentile. The re-analysis of all EWAS comparisons using the Winsorization method produced very similar results for differentially methylated CpGs at the FDR1% level (Additional file 1: Table S13). Differentially methylated CpGs were annotated to genes using the Illumina manifest and also by identifying the nearest transcription start site, which resulted in some CpGs being associated with more than one transcript.
RNA-seq analysis. Aliquots of RNA (250 ng) were used to generate poly-adenylated RNA libraries with TruSeq Stranded Total RNA Ribo-Zero Human Gold kit (Illumina). Samples were indexed with NEXTfle-96 RNA-Seq Barcodes (Bioo-scientific, Austin, TX) and 75 bp single-end sequencing was performed on NovaSeq 6000 platform using S4 flow cell (Illumina) in the NIEHS Epigenomics and DNA Sequencing Core Laboratory. Raw sequencing reads (FASTQ files, 22-150 million reads per sample) were aligned to hg38 using STAR and gene counts were generated with featureCounts using the GENCODE version 39 annotation. Count matrix data were then imported to Partek Flow (Partek Inc., Chesterfield, MO) and PCA was used to visualize outlier samples (Additional File 2: Fig S7). One outlier was removed from the cord blood analysis. Quantification of transcript expression and differential expression analyses were performed using DESeq adjusting for nine covariates (GA, BW, sex, CD4 + T%, CD8 + T%, B%, monocyte%, granulocyte%, NRBC%). DEGs were determined between BPD and nonBPD on day 0 (cord blood) or between BPD and nonBPD exposed to any day(s) of O 2 on days 14 and 28 (peripheral blood) with cutoff for significance at p < 0.05 and/or FDR-adjusted p < 0.1. DEGs between nonBPD exposed no NICU O 2 and nonBPD exposed to ≥ 14 days NICU O 2 with cutoff for significance at FDR-adjusted p < 0.01. Controlled hierarchical cluster analysis by disease status generated heatmaps showing a structure of DEG expression trends and partition of DEGs into different clusters using Partek Flow. RNA-Seq raw data are deposited in GEO (accession number: GSE220135).